8 - Hybrid CKI Conference - Keynote: "Biases and shortcuts - towards more robust machine learning" [ID:21888]
50 von 244 angezeigt

So, yeah, thank you. Thank you very much for this quite extensive introduction. I'm very

happy to be here today. And today I would like to talk about some of the challenges

that I as a researcher face on a day-to-day basis when I'm dealing with artificial intelligence.

Namely, I want to talk about how biases and shortcuts impact the models that we train

in this area. I'm, as Professor Honegger nicely introduced, I'm a researcher in medical imaging

and I want to get computers to try to support different aspects of patient care. So diagnosis,

for example, improving treatment strategies, improving disease monitoring and the like.

And I want to start motivating these shortcuts and biases from a slightly different perspective.

Yesterday, if you joined this stream yesterday, you probably heard about potential applications

of AI in human resources. But some time ago, and if you follow the media in this area,

you probably have heard about these aspects. So Amazon was experimenting with a system

to assess job applications for automatic interview decisions. However, when they investigated

the system more closely and investigated the decisions that the system had more closely,

they found that it had a considerable bias against women. It is very likely that this

behavior was based on prior hiring decisions. So an existing bias in this historical data,

over the last 10 years, was very likely amplified. And this is, of course, not something that

an AI is supposed to do if it's working in practice. Also, quite recently, there was

another report of an AI system that behaved unexpectedly. Studies showed that many face

recognition algorithms from different vendors depend very strongly on what gender and skin

color a participant has. So face recognition software often shows much lower performance,

for example, for black females compared to white males. And these are just two examples.

There are many AI algorithms that behave differently in the lab compared to how they behave in

practice and afterwards during the application. So why do they fail? And why do they fail

unexpectedly? Why is it sometimes so challenging to have very promising applications to transfer

them to the real world? And this question is also relevant in my line of research. So

many recent publications deal with the question of robustness. How can we get our algorithms

to work across different patient populations, to work across different scanners, and to

work across different hospitals? I want to start with a simple example that kind of highlights

this very nicely. Imagine that you get a data set for an application from a hospital, and

you want to distinguish malignant and benign lesions in this case. So you start developing

an algorithm. And what comes to mind when you see this data set that, well, the malignant

lesions kind of have a different color compared to the benign lesions. And many state of the

art algorithms, many state of the art AI approaches will also find color as a discriminative feature

because it's very easy for them to pick this feature up. So we get a network that performs

very well on this kind of data. So we publish a paper. We get very nice results. However,

then we apply the network in practice, and we see that it falls short of our expectations.

So we investigate it closer. And what we can see here is that color is actually not the

discriminative feature. If we inspect it closer, we can see that the actual feature that discriminates

malignant and benign lesions is the shape of these lesions. And what is important here

is that we could have picked up on this feature already in the first data set. It was just

easier to find color. But color was a wrong correlation, a spurious correlation that we

saw in this subset of our data. So without having any additional knowledge and given

just this first data set, we can't really say the AI did wrong because it followed basically

the data. It's just a shortcut that it found that did not translate and did not generalize

to a new setting. Now, this might have been a relatively artificial example, but this

can also happen in practice. In a recent study, researchers investigated how deep learning

can be used for pneumonia detection x-rays. And they were able to show that networks can

very easily pick up certain markers and discriminate based on small x-ray markers that are placed

in the image. Now, these markers, they are different between hospitals, so the networks

are able to differentiate basically where the images come from. And imagine that you

Teil einer Videoserie :
Teil eines Kapitels:
Hybrid CKI Conference: 16 October 2020

Zugänglich über

Offener Zugang

Dauer

00:24:41 Min

Aufnahmedatum

2020-10-16

Hochgeladen am

2020-10-26 17:07:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen